A PROBABILISTIC STUDY OF NEURAL COMPLEXITY 



J. BUZZI AND L. ZAMBOTTI 

Abstract. G. Edclman, O. Sporns, and G. Tononi have introduced the neural 
complexity of a family of random variables, defining it as a specific average of 
mutual information over subfamilies. We show that their choice of weights satisfies 
two natural properties, namely exchangeability and additivity, and we call any 
functional satisfying these two properties an intricacy. We classify all intricacies 
in terms of probability laws on the unit interval and study the growth rate of 
maximal intricacies when the size of the system goes to infinity. For systems of a 
fixed size, we show that maximizers have small support and exchangeable systems 
have small intricacy. In particular, maximizing intricacy leads to spontaneous 
symmetry breaking and failure of uniqueness. 



1. Introduction 

1.1. A functional over random systems. Natural sciences have to deal with 
"complex systems" in some obvious and not so obvious meanings. Such notions 
first appeared in thermodynamics. Entropy is now recognized as the fundamental 
measure of complexity in the sense of randomness and it is playing a key role as well 
in information theory, probability and dynamics [12]. Much more recently, subtler 
forms of complexity have been considered in various physical problems [H [3l [71 [11] , 
though there does not seem to be a single satisfactory measure yet. 

Related questions also arise in biology. In their study of high-level neural net- 
works, G. Edelman, O. Sporns and G. Tononi have argued that the relevant com- 
plexity should be a combination of high integration and high differentiation. In [22] 
they have introduced a quantitative measure of this kind of complexity under the 
name of neural complexity. As we shall see, this concept is strikingly general and 
has interesting mathematical properties. 

In the biological [IOl[T3l[Tll[l6l[I7l[l8l[l9l[20l[23l[2l]and physical [21 [8] literature, 
several authors have used numerical experiments based on Gaussian approximations 
and simple examples to suggest that high values of this neural complexity are indeed 
associated with non-trivial organization of the network, away both from complete 
disorder (maximal entropy and independence of the neurons) and complete order 
(zero entropy, i.e., complete determinacy) . 
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The aim of this paper is to provide a mathematical foundation for the Edelman- 
Sporns-Tononi complexity. Indeed, it turns out to belong to a natural class of func- 
tionals: the averages of mutual informations satisfying exchangeability and weak- 
additivity (see below and the Appendix for the needed facts of information theory). 
The former property means that the functional is invariant under permutations of 
the system. The latter that it is additive over independent systems. We call these 
functionals intricacies and give a unified probabilistic representation of them. 

One of the main thrusts of the above-mentioned work is to understand how sys- 
tems with large neural complexity look like. From a mathematical point of view, this 
translates into the study of the maximization of such functionals (under appropriate 
constraints). 

This maximization problem is interesting because of the trade-off between high 
entropy and strong dependence which are both required for large mutual informa- 
tion. Such frustration occurs in spin glass theory [21] and leads to asymmetric 
and non-unique maximizers. However, contrarily to that problem, our functional is 
completely deterministic and the symmetry breaking (in the language of theoretical 
physics) occurs in the maximization itself: we show that the maximizers are not ex- 
changeable although the functional is. We also estimate the growth of the maximal 
intricacy of finite systems with size going to infinity and the size of the support of 
maximizers. 

The computation of the exact growth rate of the intricacy as a function of the size 
and the analysis of systems with almost maximal intricacies build on the techniques 
of this paper, especially the probabilistic representation below, but require additional 
ideas, so are deferred to another paper [S]. 

1.2. Intricacy. We recall that the entropy of a random variable X taking values in 
a finite or countable space E is defined by 

H(X) :=-Y^P^{x) log{Px{x)), Px{x) ■.= F{X = x). 

Given two discrete random variables defined over the same probability space, the 
mutual information between X and Y is 

MI(X, Y) := H(X) + H(F) - H(X, Y). 

We refer to the appendix for a review of the main properties of the entropy and 
the mutual information and to [6] and [12] for introductions to information theory 
and to the various roles of entropy in mathematical physics, respectively. For now, 
it suffices to recall that MI(X, F) > is equal to zero if and only if X and Y are 
independent, and therefore MI(X, Y) is a measure of the dependence between X 
and Y. 

Edelman, Sporns and Tononi [22] consider systems formed by a finite family 
X = {Xi)i(zi of random variables and define the following concept of complexity. 
For any S G I, they divide the system in two families 



Xs:={X,,zeS), Xsc:={X„zeS''), 
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where := I\S. Then they compute the mutual informations Ml{Xs,Xsc) and 
consider an average of these: 

:= MI(Xc,,X5c), (1.1) 

' ' 5c/ \\S\) 

where |/| denotes the cardinahty of / and (^) is the binomial coefficient. Note that 
X(X) is really a function of the law of X and not of its random values. 

The above formula can be read as the expectation of the mutual information be- 
tween a random subsystem Xs and its complement X^c where one chooses uniformly 
the size k E {0, . . . ,\I\} and then a subset 5 C / of size \S\ = k. 

In this paper we prove that I fits into a natural class of functionals, which we 
call intricacies. We shall see that these functionals have very similar, though not 
identical properties and admit a natural and technically very useful probabilistic 
representation by means of a probability measure on [0, 1]. 

Notice that X > and X = if and only if the system is an independent family 
(see Lemma [3.91 below). In particular, both complete order (a deterministic family 
X) and total disorder (an independent family) imply that every mutual information 
vanishes and therefore X(X) = 0. 

On the other hand, to make (11. ip large, X must simultaneously display two differ- 
ent behaviors: a non-trivial correlation between its subsytems and a large number of 
internal degrees of freedom. This is the hallmark of complexity according to Edel- 
man, Sporns and Tononi. The need to strike a balance between local independence 
and global dependence makes such systems not so easy to build (see however Exam- 
ple |2TTU] and Remark 12.111 below for a simple case). This is the main point of our 
work. 

1.3. Intricacies. Throughout this paper, a system is a finite collection (Xj)jg/ of 
random variables, each Xi, i E I, taking value in the same finite set, say {0, . . . , d—1} 
with d > 2 given. Without loss of generality, we suppose that J is a subset of the 
positive integers or simply {1, . . . , N}. In this case it is convenient to write N for J. 

We let X{d, I) be the set of such systems and M.[d, I) the set of the corresponding 
laws, that is, all probability measures on {0, . . . ,d — 1}^ for any finite subset /. We 
often identify it with M{d,N) := M{d, {1, . . . , N}) for N = If X is such a 
system with law /i, we denote its entropy by H(X) = H(/i). Of course, entropy is in 
fact a (deterministic) function of the law /i of X and not of the (random) values of 
X. 

Intricacies are functionals over such systems (more precisely: over their laws) 
formalizing and generalizing the neural complexity (11.11) of Edelman-Sporns- Tononi 
122]: 



Definition 1.1. A system of coefficients is a family of numbers 

c ■= (4 : / CC N*, 5 C /) 
satisfying, for all I and all S C I: 

4>0, 5^4 = 1, and 4. = 4 (1.2) 

sci 
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where S'^ := I \ S. We denote the set of such systems by C{N*). 

The corresponding mutual information functional is 2'^ : X W defined by: 

r{x) :=5^4mi(X5,X50. 

By convention, MI(X0,X/) = Ml {Xj, Xii,) = 0. If X E X{d,I) has law /i, we 
denote X'^{X) = X'^(yu). X'^ is non-null if some coefficient with S ^ {0, /} is not 
zero. 

An intricacy is a mutual information functional satisfying: 

(1) exchangeability (invariance by permutations): if I, J CC N* and cf) : 
I ^ J is a bijection, then X'^{X) = X'^{Y) for any X := (Xj)jg/, Y := 
{X^-iQ))j^j; 

(2) weak additivity; X^{X,Y) = X^{X) + X^{Y) for any two independent 

systems (Xi)^^/, {Yj)j(zj. 

Clearly, by (II. ip . neural complexity is a mutual information functional with 
^5 = TfTTi 7T7TY) satisfying exchangeability. Weak additivity is less trivial and will 

be deduced in Theorem 11.21 below. We remark that the factor (|/| + 1) in the de- 
nominator is not present in the original definition in [22] but is necessary for weak 
additivity and the normalization (11. 2p to hold. 

1.4. Main results. Our first result is a characterization of systems of coefficients c 
generating an intricacy, i.e. an exchangeable and weak additive mutual information 
functional. These properties are equivalent to a probabilistic representation of c. 
We say that a probability measure A on [0, 1] is symmetric if Jj^ ^ f{x) X{dx) = 

I[o 1] ■f^^ ~ x)X{dx) for all measurable and bounded functions /. 

Theorem 1.2. Let c G C(N*) be a system of coefficients and X'^ the associated 
mutual information functional. 

(1) X'^ is an intricacy, i.e. exchangeable and weakly additive, if and only if there 
exists a symmetric probability measure Ac on [0, 1] such that 

c's= [ xl^l(l-x)l^l-l^lAc(rfx), y SCI. (1.3) 

J [0,1] 

In this case, if {Wc,Yi,i G N*} is an independent family such that has 
law Ac and Yi is uniform on [0, 1], then 

4 = P(Z n / = S), V/ CC N*, V5 c /, 

where Z is the random subset of N* 

Z := {2 G N* : > Wc\. 

(2) Ac is uniquely determined by X'^ . Moreover X'^ is non-null Zj(f Ac(]0, 1[) > 
and in this case 4 > for all coefficients with S G I , S ^ {0, /}. 
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(3) For the neural complexity fll.ll) . we have 
1 1 



(1^1) 



J [0,1] 



i.e., Ac in this case is the Lehesgue measure on [0, 1] and the neural complexity 
is indeed exchangeable and weakly additive, i.e. an intricacy. 

We discuss other explicit examples in section 2 below. 

Our next result concerns the maximal value of intricacies. As discussed above, 
this is a subtle issue since large intricacy values require compromises. This can also 
be seen in that intricacies are differences between entropies, see (12.21) and therefore 
not concave. 

The weak additivity of intricacies is the key to how they grow with the size of 
the system. This property of neural complexity having been brought to the fore, we 
obtain linear growth and convergence of the growth speed quite easily. The same 
holds subject to an entropy condition, independently of the softness of the constraint 
(measured below by the speed at which 5n converges to 0). 

Denote by 2'^{d,N) and X'^{d, N,x), x G [0,1], the supremum of X^{X) over all 
X G X{d, N), respectively over all X G X{d, N) such that H(x) = xNlogd: 

T{d, N) := sup{J"(/i) : fieMid, N)}, (1.4) 

l\d,N,x) ■= sup{X'=(/i) : /i G M{d,N), H(/i) = xNlogd}. (1.5) 

Notice that if x = or x = 1, then T'^{d,N,x) = 0, since this corresponds to, 
respectively, deterministic or independent systems, for which all mutual information 
functionals vanish. 

Theorem 1.3. Let X'^ he a non-null intricacy and let d > 2 be some integer. 

(1) The following limits exist for all x G [0, 1] 

X=(,)..U„™ Z=(,.,)..„„I!(^^, (1.6) 

n">oo JV n— >oo _/V 

and we have the hounds 

i..(i-.)i..<^;<^^<f^^<i (1.7) 

log d log a 2 

where 

K,:=2[ y{l-y)X,{dy)>0, (1.8) 
J[o,i] 

and Ac is defined in Theorem \1.2[ 

(2) Let {6n)n>i he any sequence of non-negative numhers converging to zero and 
X G [0,l].~r/ien 

H(X) 



X\d, x) = hm 4 sup { T{X) : X G X{d, N), 



AT log d 



X 



< 6 



N 
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Remark 1.4. 

1. By considering a set of independent, identically distributed (i.i.d. for short) 
random variables on {0, . . . ,d — 1}, it is easy to see that for any < h < Nlogd, 
there is X G X{d,N) such that H(X) = h and X'^(X) = 0. Hence minimization of 
intricacies is a trivial problem also under fixed entropy. 

2. It follows that for any {x,y), < a: < 1 such that < y < X'^{d,x)/ logd, 
for any large enough, there exists X G X{d,N) with B.{X) = xNlogd and 
I'^(X) = yN log d. Observe, for instance, that X'^ is continuous on the contractile 
space M{d, N). 

3. In the above theorem, the assumption that each variable Xj takes values in a 
set of cardinality d can be relaxed to H(Xj) < logd. It can be shown that this does 
not change T'^{d) or I^{d, x). 

Thus maximal intricacy grows linearly in the size of the system. What happens 
if we restrict to smaller classes of systems, enjoying particular symmetries? Since 
intricacies are exchangeable, their value does not change if we permute the variables 
of a system. Therefore it is particularly natural to consider (finite) exchangeable 
families. 

We denote by EX{d,N) the set of random variables X G X{d,N) which are 
exchangeable, i.e., for all permutations a of {1,...,A^}, X := {Xi, . . . , X^) and 
Xo- := (Xo-(i), . . . , Xo.(Ar)) have the same law. 

Theorem 1.5. Let X'^ be an intricacy. 

(1) Exchangeable systems have small intricacies. More precisely 

sup X\X) = o{N^/^+'), N ^ +00, 

XGEX(d,7V) 

for any e > 0. In particular 

lim — max T(X) = 0. 

N^cxi N XeEX(d,N) 

(2) For N large enough and fixed d, maximizers of X{d, N) 3 X \—>- I'^{X) are 
neither unique nor exchangeable. 

By the first assertion, exchangeability of the intricacies is not inherited by their 
maximizers. Indeed, exchangeable systems are very far from maximizing, since the 
maximum of T'^ over EX{d, N) is o{Np) for any p > 2/3 whereas the maximum of X'^ 
over X{d,N) is proportional to N. This "spontaneous symmetry breaking" again 
suggests the complexity of the maximizers. We remark that numerical estimates 
suggest that the intricacy of any X G EX{d, N) is in fact bounded by const log N. 

The second assertion of Theorem 11.51 follows from the first one: for N sufficiently 
large, the maximal intricacy is not attained at an exchangeable law; therefore, by 
permuting a system with maximal intricacy we obtain different laws, all with the 
same maximal intricacy. 

We finally turn to a property of exact maximizers, namely that their support is 
concentrated on a small subset of all possible configuration: 
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Theorem 1.6. Let be a non-null intricacy, let d > 2. For N a large enough 
integer, the following holds. For any X maximizing I'^ over X{d,N), law fi of X 
has small support, i.e. 

e Ad,N ■■ KW}) = 0} > const 

for some const > 0. 

1.5. Further questions. As noted above, the exact computation of the functions 
X'^((i) and I'^{d,x) from Theorem 11.31 in terms of their probabilistic representation 
from Theorem 11.21 will be the subject of [5] where we shall study systems with 
intricacy close to the maximum. 

Second, to apply intricacy one needs to compute it for systems of interests. It 
might be possible to compute it exactly for some simple physical systems, like the 
Ising model. A more ambitious goal would be to consider more complex models, 
like spin glasses, to analyze the possible relation between intricacy and frustration 

m- 

A more general approach would be to get rigorous estimates from numerical ones 
(see [22] for some rough computations). A naive approach results in an exponential 
complexity and thus begs the question of more efficient algorithms, perhaps proba- 
bilistic ones. A related question is the design of statistical estimators for intricacies. 
These estimators should be able to decide many-variables correlations, which might 
require a priori assumptions on the systems. 

Third, one would to understand the intricacy from a dynamical point of view: 
which physically reasonable processes (say with dynamics defined in terms of local 
rules) can lead to high intricacy systems and at what speeds? 

Fourthly, one could consider the natural generalization of intricacies, already pro- 
posed in [22] but not explored further, is given in terms of general partitions vr of /: 

if TT = {Si, Sk} with UiSi = I and Si fl S*-,- = for i ^ j, then we can set 

MI(X^) := E{XsJ + ■■■ + niXs,) - H(X), X G X{d, I), (1.9) 
and for some non-negative coefficients (cjr),! 

J'^iX) :=5^c^MI(X^). (1.10) 

TT 

Most results of this paper extend to the case where the coefficients {ct^)^ have a 
probabilistic representation in terms of the so-called Kingman paintbox construction 
H §2.3], see Remark [33] below. 

One might also be interested to extend the definition of intricacy to infinite (e.g., 
stationary) processes, continuous or structured systems, e.g., taking into account a 
connectivity or dependence graph (such constraints have been considered in numer- 
ical experiments performed by several authors [21 El [IB])- 

Finally, our work leaves out the properties of exact maximizers for a given size. 
As of now, we have no description of them except in very special cases (see Examples 
12.91 and 12.101 below) and we do not know how many there are, or even if they are 
always in finite number. We do not have reasonably efficient ways to determine 
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the maximizers which we expect to lack a simple description in light of the lack of 
symmetry established in Theorem II .51 

1.6. Organization of the paper. In Sec. 2, we discuss the definition of intricacies, 
giving some basic properties and examples. Sec. 3 proves Theorem 11.21 translating 
the weak additivity of an intricacy into a property of its coefficients. As a by- 
product, we obtain a probabilistic representation of all intricacies. We check that 
neural complexity corresponds to the uniform law on [0,1]. In Sec. 4 we prove 
Theorem 1 1.3 1 by showing the existence of the limits 1^{d), I^{d, x). Finally, in Sec. 5 
we prove Theorem 1 1 . 5 1 and . in Sec. 6, Theorem 11.61 An Appendix recalls some basic 
facts from information theory for the convenience of the reader and to fix notations. 

2. Intricacies 

2.1. Definition. We begin by a discussion of the definition 11.11 above of intricacies. 
As MI{Xs,Xsc) = MI{Xs'=,Xs), the symmetry condition always be 

satisfied by replacing Cg with ^{cg + Cgc) without changing the functional. Also 
Ylsci C5 = 1 is simply an irrelevant normalization when studying systems with a 
given index set /. 

The following mutual information functionals will be proved to be intricacies in 
section [31 

Definition 2.1. The intricacy T of Edelman-Sporns-Tononi is defined by its 
coefficients: 

For < p < 1, the p-symmetric intricacy Xp{X) is: 

4 = l(p|5|(l_p)lA5| + (i_py5|piA5|)_ 

For p = 1/2, this is the uniform intricacy I^{X) with: 

4 = 2-1^1. 

It is not obvious that the three above mutual information functionals are weakly 
additive, but this will follow easily from Lemma 13.71 below. Proposition 13.51 below 
describes all intricacies. 

Remark 2.2. The coefficients of the Edelman-Sporns-Tononi intricacy I ensure 
that subsystems of all sizes contribute significantly to the intricacy. This is in sharp 
contrast to the p-symmetric coefficients for which subsystems of size far from pN or 
(1 — p)N give a vanishing contribution when gets large. 

Remark 2.3. The global 1/(|/| + 1) factor in X is not present in [22], which did not 
compare systems of different sizes. However it is required for weak additivity. 
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2.2. Basic Properties. We prove some general and easy properties of intricacies. 
Recall that X{d,N) is the set of A^ tv- valued random variables, where A^ tv = 
{0, . . . ,d — 1}^. We identify it with the standard simplex in R'^'' in the obvious 
way. 

Lemma 2.4. Let be a mutual information functional. For each d > 2 and 
N >1,1^ : Ai{d,N) —>■ H is continuous. In particular, the suprema I'^{d,N) and 
T'^{d,N,x), introduced in (11.41) and (II. 5p . are achieved. 

IfT'^ is a non-null intricacy, then it is neither convex nor concave. 

Proof. Continuity is obvious and existence of the maximum follows from the com- 
pactness of the finite-dimensional simplex M.{d,N). To disprove convexity and 
concavity of non-null intricacies, we use the following examples. Pick / with at 
least two elements, say 1 and 2. Observe that K := c^^^ + C|2| is positive by the 
non- degeneracy of X'^ (see Lemma [3.81 below) . Fix d > 2. 

First, for i = 0, 1, let /ij over {0, . . . ,d — 1}^ be defined by /ij(i, i, 0, . . . , 0) = 1. 
We have: 



Second, let uq be defined by z/o(0, 0, 0, . . . , 0) = z/o(l, 1, 0, . . . , 0) = 1/2 and z/i by 
z/i(0, 1, 0, . . . , 0) = z/i(l, 0, 0, . . . , 0) = 1/2. We have: 



□ 

The following expression of an intricacy as a non-convex combination of the entropy 
of subsystems is crucial to its understanding. 

Lemma 2.5. For any intricacy X^ and X ^ X{d, N) 

nX) = 2 ( $^4H(X5) ) - H(X). (2.2) 
\sci J 

Proof The result readily follows from: MI(X, Y) = H(X) + H(F) - H(X, F), 4 = 
4c, and X;s4 = 1- n 

We introduce the notation 

Ml{S) ■.= Ml{Xs,Xj\s) 
which will be used only when the understood dependence on X and / is clear. 
Lemma 2.6. For any intricacy X'^ and any system X G X{d, N) 

0<TiX)<^hgd. 
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Proof. The inequalities follow from basic properties of the mutual information (see 
the Appendix): 

< MI(^) < mm{R{Xs),ii{Xsc)} < min{|^|, A^ - |^|} logrf < — logrf. 

□ 

2.3. Simple examples. We give some examples of finite systems and compute their 
intricacies both for illustrative purposes and for their use in some proofs below. 

Let Xi take values in {0, . . . , (i — 1} for all i & I, a. finite subset of N*. The first 
two examples show that total order and total disorder make the intricacy vanish. 

Example 2.7 (Total disorder). If the variables Xi are independent then each mutual 
information is zero and therefore: X'^{X) = 0. □ 

Example 2.8 (Total order). If each Xi is a.s. equal to a constant Cj in {0, . . . , d—1}, 
then, for any ^ ^ 0, H(Xs) = 0. Hence, J"(X) = 0. □ 

For N = 2,3, each mutual information can be maximized separately: there is no 
frustration and it is easy to determine the maximizers of non-null intricacies. 

Example 2.9 (Case N = 2). Let first N = 2 and be a non-null intricacy. Then 
by Theorem 11.21 Cg = cj^' and therefore 

X%X) = (c{;f + c|Jf ) MI(Xi,X2) = 2cl MI(Xi,X2), X G X{d,2), 

and moreover cf > 0. Therefore the maximizers of X'^ over X{d, 2) are the maximizers 
of X h->- MI(Xi,X2). By the discussion in subsection IA.3I of the appendix, we have 
that MI(Xi,X2) < min{H(Xi),H(X2)}. Now, Ml{X,Y) = H(Xi) = H(X2) iff each 
variable is a function of the other. 

Therefore, the maximizers are exactly the following systems X = {Xi,X2). Xi is 
a uniform r.v. over {0, . . . ,d — 1} and the other is a deterministic function of the 
first. X2 = o"(Xi) for a given permutation cr of {0, . . . ,d — 1}. In the case of the 
neural complexity, maxx^x{d,2}^{X) = (log(i)/3. □ 

Example 2.10 (Case N = 3). Let X = 3 and / := {1,2,3}. By Theorem Ol 
Cg = cj^'|, cf = cl and therefore 

X'=(X) = 2c? (MI(Xi,X|2,3}) + MI(X2,X|i,3}) + MI(X3,X|i,2})) , 

and moreover c"^ > 0. Here we simultaneously maximize each of these mutual 
informations. The optimal choice is a system (Xi, X2, X3) where every pair (Xj, Xj), 
i 7^ j, is uniform over {0, . . . , d — l}"^, and the third variable is a function of (Xj, Xj). 
This is realized iff (Xi, X2) is uniform over {0, . . . , rf— 1}^ and X3 = 0(Xi, X2), where 
is a (deterministic) map such that, for any i G {0, . . . , rf— 1}, 0(i, ■) and 0(-, i) are 
permutations of {0, . . . , c? — 1}. For instance: ^(xi, X2) = Xi + X2 mod d. In the 
case of the neural complexity, maxxeA'(d,3) 2^(-^) = (log(i)/2. □ 
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The maximizers of examples 12.91 and 12.101 are very special. For instance, they are 
exchangeable, contrarily to the case of large according to Theorem II. 5 [ For = 4 
and beyond it is no longer possible to separately maximize each mutual information 
and we do not have an explicit description of the maximizers. We shall however see 
that, as in the above examples, maximizers have small support, see Proposition II. 6[ 

Remark 2.11. Example 12. 101 has an interesting interpretation: for = 3, a system 
with large intricacy shows in a simple way a combination of differentiation and 
integration, as it is expected in the biological literature, see the Introduction. Indeed, 
any subsystem of two variables is independent (differentiation), while the whole 
system is correlated (integration). 

Another interesting case is that of a large system where one variable is free and 
all others follow it deterministically. 

Example 2.12 (A totally synchronized system). Let Xi be a uniform {0, . . . , d~l}- 
valued random variable. We define now {X2, . . . , X^) '■= (p{Xi), where is any 
deterministic map from {0, . . . ,d — 1} to {0, . . . ,d — 1}^~^. Then, for any S* 7^ 0, 
}i{Xs) = logd and, if additionally S'^ 7^ 0, B.{Xs\Xsc) = so that each mutual 
information MI(X5,X5c) is logd if S ^ {0)-^}- Hence, 

r(X)= Yl c's-logd={l~ci-c'j)\ogd. D 

SC/\{0,/} 

In the next example we build for every x g]0, 1[ a system X G X{d,2) with 
entropy H(X) = xlogci^ and positive intricacy. 

Example 2.13 (A system with positive intricacy and arbitrary entropy). Let first 
X e]0, 1/2]. Let Xi be {0,...,d- l}-valued with H(Xi) = 2a; logd. Such a vari- 
able exists because entropy is continuous over the connected simplex of probability 
measures on {0, . . . ,d—l} and attains the values over a Dirac mass and logd over 
the uniform distribution. We define now X2 := Xi and X := {Xi,X2) G X{d,2). 
Therefore H(X) = 2xlogci = xlogd"^, Ml{Xi,X2) = H(Xi) = 2a; logd and, arguing 
as in Lemma [2.91 

I%X) = 2c\ MI(Xi,X2) = 4xc? logd > 0. 

Let now x g]1/2, 1[. Let (1^1,1^25-8) be an independent triple such that 1^ is 
uniform over {0, . . . , d — 1} and B is Bernoulli with parameter p G [0, 1] and set 

:= Fi, X2 := l(B=o) Yx + 1(b=i) Y2. X := (Xi,X2). 

Then both Xi and X2 are uniform on {0, . . . , d — 1}. On the other hand, it is easy to 
see that H(X), as a function of p G [0, 1], interpolates continuously between logd and 
2 logd. Thus, for every x g]1/2, 1[ there is a p G [0, 1] such that H(X) = xlogd^. 
In this case MI(Xi, X2) = 2(1 — a;) logd and we obtain 

T{X) = 2cl MI(Xi,X2) = 4(1 - a;) logd > 0. □ 

Intricacy can indeed reach over X{d,N) the order A^ of Lemma [2.61 as the next 
example shows. 
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Example 2.14 (Systems with uniform intricacy proportional to A^). Let us fix 
d > 2. For > 2, we are going to build a system (Xj)jg/, / = {1, . . . , A^}, over 
the alphabet {0, . . . , o?^ — 1} for which X^{X)/N converges to (log(i^)/4; later, in 
Example I3.10[ we shall generalize this to an arbitrary intricacy. 

Let Yi, . . . , Yn be i.i.d. uniform {0, . . . ,d — l}-valued random variables and define 
Xi := Yi + dYi+i hr i = 1, . . . , N - 1, Xn := Yn. Note that X G Xid"^, N) and 
H(X) =Nlogd= {N/2) logd\ For ^ C /, set 

As:={k = l,...,N-l: ls{k) ^ \s{k + 1)}, 

Us:={k = \...,N -\: \s{k) = 1 ^ ls{k + 1)}. 

Observe that B.{Xs) = {\S\ + \Us\)logd. Indeed, this is given by \ogd times the 
minimal number of Y^ needed to define Xs', every fc G 5* counts for one if k & S\Us, 
for two if e Us] therefore we find \S\ — \Us\ + 2\Us\ = \S\ + \Us\. Moreover, 
\Us\ + \Us<:\ = |As|. Therefore 

MI(^) = {\Us\ + \S\ + \Us^\ + \S''\-N)\ogd= \As\ \ogd. 

Moreover we have a bijection: 

S e {0, ^ {ls{l), As) e {0, 1} X {0, 

Hence: 

^!^ = 2-^yiAsi = 2-^x2 y |A| = 2-^+1 yV^"^ 

logd Z^i ^1 I I ^\ k 

SCI Ac{l,...,N-l} k=0 ^ 

= 2-^+^(iV- 1)2^-2 = ^^^. 

Therefore for this X eX{d^,N): 

I^iX) = log{d'). □ 

The following example will be useful to show that an intricacy X'^ determines its 
coefficients c G C(N*) in Lemma [3.21 below. 

Example 2.15 (A system with a synchronized sub-system). We consider a system 
of uniform variables, with a subset of equal ones and the remainder independent. 
More precisely, let / CC M*, 7^ K C / and fix io e K. {Xi)i(,j G X{d,I) is the 
system satisfying: 

(i) the family Xxc^j^i^^y is uniform on {0, . . . , — Ij^^^M-^ 

(ii) Xi = Xig for all i G K. 
It follows that 

H(X5) = (|5\K| + l(5nir^0))logd 

and therefore 

Ml{S) = {l(snK^(D) + ^iS-nK^<D) - l) logrf, 
i.e. MI(S') = unless 5* and 5"^ both intersect K and then MI(S') = logd. Thus 

T{X) = logrf^4 1^9^snK^K), H(X) = (li^^l + 1) logrf. □ 

5c/ 
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3. Weak additivity, projectivity and representation 



In this section we prove Theorem II. 2[ by studying the additivity of mutual infor- 
mation functionals and characterizing it in terms of the coefficients. We estabhsh a 
probabihstic representation of all intricacies and check that the neural complexity 
is indeed an intricacy. We conclude this section by some useful consequences of this 
representation. 

Throughout this section, X = (Xj)jg/ and Y = (li)jgj, will be two systems 
defined on the same probability space and we shall consider the joint family {X, Y) = 
{Xi,Yj : i & I,j & J}. {X,Y) is again a system and its index set is the disjoint 
union J U J of / and J. 

3.1. Projectivity and Additivity. We show that weak additivity and exchange- 
ability can be read off the coefficients and that non-null intricacies are neither sub- 
additive nor super- additive. 

Proposition 3.1. Let I'^ be a mutual information functional. Then 

(1) 1'^ is exchangeable if and only if Cg depends only on \I\ and \S\ 

(2) X'^ is weakly additive if and only if the coefficients are projective, i.e., satisfy 



(3) Let X'^ be an intricacy. Then, for non-necessarily independent systems X, Y , 
we have: X'^{X,Y) > max{X'^(X),X'^(y)} and the approximate additivity: 



(4) X'^ can fail to be super- additive or sub-additive. 

To prove this proposition we shall need the following fact: 

Lemma 3.2. Let d > 2 and I be a finite set. The data X'^{X) for X G X{d, J) for 
all J CC / determine c G C{I). 

Proof. Using Cgc = Cg, we restrict ourselves to coefficients with 15*1 < \S'^\, i.e., 
I'S'I < |/|/2. Let us first consider a system (Xj)jg/ G X{d,I) where all variables 
are equal: Xi = Xj for all i,j G / and Xj is uniform on {0, . . . ,d — 1}. Then 
MI(5) := MllXs,Xsc) = for 5 = or 5 = /, otherwise MI(5) = logd. Hence, 
using the normalization 1 = c^: 



In particular, Cg = = (1 — X^{X)/ \ogd) /2. 

For each K <Z I, let X^ be the system as in Example 12.151 Fix io ^ K. Recall 
that Ml{S) := MI(Xs, Xgc) is if 5" D or S*^ D /T, and is log d otherwise. Assume 
by induction that, for l<s<|/|/2, c^is determined for 15*1 < s (a trivial assertion 
for s = 1). Picking K C I with \K\ = |/| - s > |/|/2 > {K"] = s, we get: 

• if 15*1 < s, we say nothing of MI(S') but will use the inductive assumption; 

• if S = K or S = K", then Ml{S) = 0; 



V/ CC N*, V J CC N* \ /, WS c I, 




(3.1) 



TcJ 



iX^iX) +X%Y) -X%X,Y)\ < MI(A,F); 
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• if s < IS"! < S D K implies S = K, S C implies S = K'^ since 
s = \K'^\. In all other cases: MI(S') = logrf. 

Therefore, 

X'^iX'^) jMl{S) R{X^) 

logc/ ^ ^ logd logo? 



imS) ^ , y MI(g) H(X^) 
logfi logc? logfi 



|S'|<|7|/2 ^ 151 = 1/1/2 



lS|<s ^ s<|5|<|/|/2 |5| = |/|/2 ^ 

(the sum over 15*1 = |/|/2 is non-zero only if |J| is even). Using X]s4 — -'- 
4 = 4'=; g^t- 
J'=(X^) , H(X) ,_,^r fMliS) 



logd logrf , 

It follows that 4 = 4= is determined for any K with \K\ = s. This completes the 
induction step and the proof of the lemma. □ 

Proof of Proposition \3.1[ The characterization of exchangeability is a direct conse- 
quence of Lemma 13. 2[ 

Let us prove the second point. We first check that weak additivity implies pro- 
jectivity. For any X e X{d, I) with I CC N* and J CC N* \ J, we have: 

X\X) = r (X, Z) = Y,Y. ^'^uT MI(X5, X5c) 

5c/ TcJ 

for Z = {Zj)j^j with each Zj a.s. constant and therefore independent of X. Lemma 
13.21 then implies that fl3.ll) holds. Moreover, flA.6p yields the monotonicity claim of 
point (2). 

For the approximate additivity, we consider ( 1A.7I) for any S C I, T C J: 
Ml{{Xs, It), (^5^, Yt^)) = Ml{Xs, Xs^) + MI{Yt, FtO ± MI(X, Y) 

where ±MI(X,r) denotes a number belonging to [- MI(X, F), MI(X, F)]. The 
projectivity now gives: 

T{X,Y)= 4utMI(5UT) 

5C/,TCJ 

= Yl 4uT(MI(Xs,X50 + MI(rr,lT0±MI(X,F)) 

5C/,TCJ 

= T{X) + T{Y) ± MI(X, Y). 

This is the approximate additivity of point (2). If X and Y are independent, then 
MI(X, y) = 0, proving the weak additivity. 



NEURAL COMPLEXITY 



15 



We finally give the counter-examples. For sub-additivity, it is enough to assume 
the intricacy to be non-null and to consider X = Y a. single random variable uniform 
on {1,2} and compute: 

r{X) = r{Y) = whereas T{X, Y) = 2c\ log 2 > 0. 



For super-additivity, we assume C0 + < ^ + -2 — and take X = F a collection 
oi N = \1\ copies of the same variable uniform over {0,1}. Then MI(S') = log 2 
except if 5 e {0, /}, in which case MI(S') = 0. By example 12.121 

log2 -^""^ -c,u,<2(l-C0-c,j - j-^ . 

□ 

3.2. Probabilistic representation of intricacies. In this section, we give the 
probabilistic representation for intricacies. This will provide us with a way to esti- 
mate the maximal value of intricacy for large systems in [5]. For notational conve- 
nience, we consider intricacies over the positive integers N*. 

We say that a random variable W over [0, 1] is symmetric if W and 1 — 14^ have 
the same law. A measure on [0, 1] is symmetric if it is the law of a symmetric random 
variable. 

Proposition 3.3. Letl'^ he a mutual information functional defined by some system 
of coefficients c G C(N*) over some infinite index set, which we assume to be N* for 
notational convenience. 

(1) T'^ is an intricacy, i.e., it is exchangeable and weakly additive, if and only 
if there exists a symmetric random variable Wc over [0, 1] with law Ac such 
that for all I CC N* and S C I 

4 



E(W^fl(l-iy,)l^l-l^l) = / a:l^l(l-x)l^l"l^lAe(dx). (3.2) 

i[o,i] 

(2) Formula (13. 2p is equivalent to 

4 = P(Z n / = S), V/ CC N*, yS C /, (3.3) 

where Z is the random subset of N* 

Z := {% CW ■Yi> Tyj, (3.4) 

with (Fi)i>i an i.i.d. sequence of uniform random variables on [0,1], inde- 
pendent of Wc. 

(3) If T'^ is an intricacy, then the law Ac of Wc is uniquely determined byX'^. 
Moreover for all X G X(d,I) independent of Z 

T{X) = E(MI(Z n J)), MI(^) := Ml{Xs, Xj\s)- 

Remark 3.4. The definition (13.41) of the random set 2 is a particular case of the 
so-called Kingman paintbox construction, see [H §2.3]. In this setting, it yields a 
random exchangeable partition of N* into a subset Z and its complement, each with 
asymptotic density a.s. equal to Wc, respectively 1 — Wc- Therefore it is natural 
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to expect a similar probabilistic representation for coefficients {cT^)n of exchangeable 
and weakly additive generalized functional defined in fll.Qp and fll.lOp . 

After the proof of the proposition we give the measures /i, /i^, /i^ representing re- 
spectively T, and T^. We start with the following 

Lemma 3.5. Let C(N*) be the set of systems of coefficients of intricacies. Let 
VS{[0,1]) be the set of symmetric probability measures A on [0,1]. Then, the map 
A H- >■ c defined from VS{[0, 1]) to C{N*) according to (n := k := \S\): 

c^s = cl= I x^{l - xy-^ \{dx), V5 C / CC N*, (3.5) 



is a bijection. 



Proof of Lemma VJ. 5\ We first show that for an exchangeable weakly additive X"^, 
there exists a probability measure A on [0, 1] such that 

cl+^ = I x^{l - xf X{dx), n>l,k>0 (3.6) 
J[o,i] 

i.e., the main claim of the Lemma, up to a convenient renumbering. We need the 
following classical moment result, see e.g. [9l VII. 3]. 

Lemma 3.6. Let (a„)„>i be a sequence of numbers in [0,1]. We define (-Da)n : = 
On — CLn+i, n > 1. There exists a probability measure A on [0,1] such that an = 
J x"" X{dx) if and only if 

{D^a)n > 0, \/n> 1, VA; > 1. 
Moreover such A is unique. 

Remark that, setting = |J| and M = | J|, projectivity is equivalent to 



E VO<fc<iV. (3.7) 

o—n V / 



i=0 

For M = 1 we obtain 

c^' + Ci^ = , yO<k<N. (3.8) 

Let us set := c". One proves easily by (13.81) and recurrence on k that 

(D'=m)„ = c::+'=e [0,1], \/k,n>l. 

Therefore (m„)„>i defines a unique measure A satisfying (13. 6p for /c = 0. (13. 6p for 
general k follows by induction from: 



[ [x"(l-x)*^-x"+^(l-x)'^]rfA 



[0,1] 

x^'il-xf+^dX. 

[0,1] 
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Thus A is the unique solution to the claim of the Lemma. This uniqueness together 
with = c"_;., implies that A is symmetric. Thus any intricacy defines a measure 
as claimed. 

We turn to the converse, considering a symmetric measure A on [0, 1] and defining 
c by means of (13. 5p . The coefficients depending only on the cardinalities, X*^ is 
trivially exchangeable. The symmetry of A yields immediately = c"_^, and the 
normalization condition is given by 

k=0 ^ ^ 10,1] ^.^0 ^ ^ 

i.e. c G C(N*). To prove the projectivity of c, namely fl3.7p . we compute: 

M 



//If 



x'=(l-x)^"^A(c/x) =cf 

[0,1] 



Thus (13.71) and projectivity follow. The Lemma is proved. □ 

Proof of Proposition \3.3[ First, let X'^ be an intricacy. Lemma 1X51 yields a symmetric 
probability measure Ac on [0, 1] satisfying (13. 5p . If Wc be a random variable with 
law Ac, then (13.21) is equivalent to (13.51) . 

Conversely, suppose that c = {cg)sci has the form (13. 2p for some probability P 
defined by Wc,Yi,Y2, . . . as in the statement. Obviously c^^ > and Ylsci'^s ~ ^■ 
Cg = c^gc follows from the symmetry of Wc- Thus c is a system of coefficients. 
Exchangeability of c follows from exchangeability of the random variables l(Yi>Wc): 
i E I. By (13. 2p we know that 

4 = c|3| = E ((1 - W,f\'\ l^f ) = / - xf\'\ Udx). 

i[o,i] 

Therefore, by Lemma [3.51 the functional 1'^ is an intricacy. 

Let now Wc-, ^i, ... be defined as in point 2 of the statement and Z defined by 
(13. 4p . Each i eW belongs to the random set Z if and only if Yi > Wc. Conditionally 
on Wc, the probability of {Yi > Wc} is therefore 1 — Wc- As the variables Yi, Y2, . . . 
are independent: 

¥{z n I = s \ Wc) = {1 - Wcf"^^ i^f . 

Averaging over the values of Wc we obtain 

F{zni = S) = E ((1 - Wcf^^^ vrf I) 

and therefore (13. 2p and (13.30 are equivalent. The last assertion follows from Lemma 
13.51 and from (13.31) . The Proposition is proved. □ 
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3.3. Examples of intricacies. We show that the Edelman-Sporns-Tononi neural 
complexity (11.11) and the uniform and p-symmetric intricacies correspond to nat- 
ural probability laws on [0,1]. In particular, they are weakly additive and really 
intricacies: 

Lemma 3.7. In the setting of Lemma \3.5\ 

(1) IfWc is uniform on [0, 1] then 2'^ is the Edelman-Sporns-Tononi neural com- 
plexity f ll.ip . 

(2) If Wc is uniform on {p, 1 — p} then X'^ is the p-symmetric intricacy ; in 
the case p = 1/2, Wc = \ a.s. yields the uniform intricacy X^ . 

Proof. Let Wc be uniform on [0, 1]. Then 

p(z n / = {1, . . . , k}) = p(Zi = . . . = = 1, Zfe+i = ■ ■ ■ = = 0) 

x^{l - x)^-'' dx =■ a{k, N-k). 



'[0,1] 

We claim now that for all /c > 1 and j > 



a{kj) 



{k + l)---{k + j + l) (A; + J + 1) (^+^) ' 

i.e., the Edelman-Sporns-Tononi coefficient 

Indeed, for j = this reduces to x'' dx = l/{k + 1). To prove the general case, 
one fixes k and uses recurrence on j. Indeed, suppose we have the result for j > 0. 
Then 

x''(l-xV+^dx= I x^il-xVdx- ! x^^^il-xVdx 



1 1 



{k + 3 + 1) {'1=) {k + j + 2) {'tit') ik + J+ 2) e Y') 
If Wc is uniform over {p,l — p} then 



[0,1] 



x\l - x^-' + <5i„,)(dx) = l[p\l-pY-^ + (1 -pfp^-^) 



which is the coefficient of X^. □ 

3.4. Further properties. We deduce some useful facts from the above represen- 
tation. 

Lemma 3.8. The following are equivalent for an intricacy with associated mea- 
sure Ac as in Lemma \3. 5[ 

(1) X'^ is non-null, i.e. > for at least one choice of N >2 and 1 < k < N; 

(2) > for all N >2 and 1 < k < N - 1; 

(3) Ae(]0,lD>0. 
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Proof. We have: 

c]= [ x^(l-x)"-^Ac(rfx) 
J[o,i] 

with x^{l — x)""-' zero exactly at a; G {0, 1} whenever < j < n and strictly positive 
on ]0, 1[. Thus (1) =^ (3) ^ (2) =^ (1). □ 

Lemma 3.9. //X^ is non-null, then I^{X) = for a X E X{d,N) if and only if 
X = {Xi, . . . , Xn) is an independent family. 

Proof It is enough to show that: J^(X) = ^ H(X) = EiG/H(Xi). If X= is 
non-null and X'^(X) = 0, then by Lemma [3.81 we have MI(S') = for all C / with 
S ^ {0,-^}- Therefore H(X) = B.{Xs) + H(X5c) and an easy induction yields the 
claim. □ 

Example 3.10 (Systems with intricacy proportional to A^). We generalize the result 
of Example 12.141 from to a non-null intricacy X'^. Considering the same system 
X as in Example I2.14^ we get by Proposition | 

I^{X) 



E4|A.I=E(|A,„I) 

^ sci 

N-l 

= nMk) ^u{k+ 1)) = (iv - 1) p(i^(i) ^ i2(2)). 



fc=i 



By the probabilistic representation (13.21) through a random variable Wc with law Ac 
on [0,1], 

:=P(l^(l)^l2(2))= / 2x(l-x)Ae(rfx)e]0,l/2]. (3.9) 

J [0,1] 

Then we have obtained a system X G X{d'^., N) such that 

T{X)='^{N -l)\ogd\ □ (3.10) 

4. Bounds for maximal intricacies 

In this section we prove Theorem II. 3 [ We recall the definition (13.91) for a non-null 
intricacy X^ 

Kc = 2 x{l- x) \c{dx) = 2cl > 0. (4.1) 

J [0,1] 

Recall that 2'^{d, N) and T'^{d, N, x), defined in (II. 4p and (II. 5p . denote the maximum 
of X'^ over J^{d,N), respectively over {/i G A4{d,N) : H(/i) = xNlogd}. We are 
going to show the following 

Proposition 4.1. Let 2'^ be a non-null intricacy and d > 2. Then for all N > 2 

Kjogd f 1\ ^ I%d,N) ^ logd 

N - N - 2 ' ^ ■ ^ 
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and for any x G [0, 1] 

[x A (1 - x)] logrf (l - ^) < ^^^^^ < ^ logd, (4.3) 
where > is defined in (14. II) . 

Proof. The upper bound for X'^((i, N)/N follows from Lemma [2l6l We show now the 
lower bound for X'^{d, N,x)/N. Let x g]0, 1[. In example 12.131 we have constructed 
a system X = (Xi, X2) G ^"(6?, 2) with 

H(X) = xlogd^ J"(X) = 2Ke [x A (1 - x)] logd > 0. 

Let now (Y2i+i)i>o an i.i.d. family of copies of Xi and set 1^2(i+i) := ^2j+i for all 
z > 0. Then, for M > 1, Y := (Fi)»=i„„,2M G A'(ci,2M) is the product of M 
independent copies of (Xi,X2) and by weak additivity 

r{Y) = MT{X) = 2Mkc[x A (1 -x)]logci, R{Y) = 2Mx\ogd. 

If S* is a {0, . . . , d— l}-valued random variable independent of Y with H(Z) = x log d, 
then Z := (Yi, . . . , Y2M, S) G 2M + 1) satisfies by weak additivity 

J^(Z) =T{Yi,...,Y2m) = 2Mfi:c[x A (1 -x)]logrf, H(Z) = (2M + l)xlogrf. 

Setting = 2M, respectively iV = 2M + 1, we obtain the upper bound for 
I'^{d, N,x)/N. Taking the supremum over x G [0, 1] in ( 14. 3p . we obtain (14. 2p . □ 

4.L Super-additivity. We are going to prove that the maps 1— > X'^{d,N) and 
I— i> X'^((i, A^, x) are super-additive. By Lemma [2.41 the suprema defining 2'^{d, N) 
and X'^{d, N,x) are maxima. The measures achieving the first supremum are called 
maximal intricacy measures. 

Lemma 4.2. For any intricacy X'^ and d>2, the following limits exist. First, 

X\d) = hm = sup G ]0, +oo[ (4.4) 

and, for each x G ]0, 1[, 

^c/j N T%d,N,x) X%d,N,x) ,^ . 

X'(rf,x) = hm ^ ' ^ = sup ^ ' ^ G 0,+oo . (4.5) 

Proof. We prove fl4.5p . 04.41) being similar and simpler. Fix x g]0, 1[. For each 
A^ > 1, let Oat := X'^{d, N,x). We claim that this sequence is super-additive, i.e., 

flTV+A/ > Oat + aM, VA^, M>1. 

Indeed, let X^ and X*'-^ such that 

X%X^) =X%d,N,x), R{X^) =xNlogd, 

X\X^^) =r(rf,M,x), H(X^^) = xMlogd. 

Assume that X^ and are independent. By weak-additivity 

X=(X^,X^O = X^(X^) +x^(x^O, 

H(X^, X^O = H(X^) + H(X^O = a:(iV + M) log 
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Thus, 



+ ttM = N, x) + T{d, M, x) = T {X^) + J" (X^^) 
= r (X^, X^^) < r{d, N + M,x) = aN+M. 



Moreover, by Proposition 14.11 we have supjY>i ctAr/A^ < (log(i)/2. Therefore, by 
Fekete's Lemma qn/N sup]^^^ gm/M < (\ogd)/2 as N ^ +00. Moreover, the 
hmit is positive by (14.31) . □ 

4.2. Adjusting Entropy. To strengthen the previous result to obtain the second 
assertion of Theorem 11.31 we must adjust the entropy without significantly changing 
the intricacy. 

Lemma 4.3. Let X^^\ . . . , X^^''^ G X{d,N). Let U be a random variable over 
{l,...,r}, independent 0/ {X«, . . . , X^}. Let Y := X(^) e X{d,N), i.e., Y = 
X*^") whenever U = u. Then: 



< RiYs) -J2nU = u) H(X^"^) < logr, VS C {1, . . . , N}, 



u=l 



- logr < X^{Y) - ^F{U = n) J^(X(")) < 2 logr. 



(4.6) 
(4.7) 



u=l 



Proof. We first prove (jM]). By ([Ol) . 

R{Ys\ U) < YL{Ys) < H(F5, U) = Y{{Ys\ U) + H(f/). 
Now B.{U) < logr. dM]) follows as: 

r r 

H(y5| U) = Y,nU = u) H(F5| U = u) = Y,nU = u) H(X(")). 



u=l 



u=l 



(14. 7p follows immediately, using (12. 2p and (14.61) . 



□ 



Lemma 4.4. Let < x < 1 and e > and I'^ be some non-null intricacy. Then 
there exists (5o > and Nq < 00 with the following property for all < d < 6q and 



N >No. For any X e X{d, N) such that 
satisfying: 

H(F) = xNlogd, 
Proof. We fix 60 = 6o{e,x) > so small that: 

So 



H(X) 



— X 



min{l — X — 6q, X ~ 60} 
and Nq = NQ{e, x, 60) so large that: 

log 2 



Nlogd 

r(F) -r(X)| < eNlogd 



< e/4 



< 6, there exists Y & X{d, N) 



Nq min{l — x — 60, x — 60} log d 



< e/4. 



Let N > Nq and X e X{d,N) be such that 



two similar cases, depending on whether H(X) is greater or less than xNlogd. 



H(X) 



Af logd 



< S < 5(). There are 
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We assume h := Y{{X)/N\ogd < x and shall explain at the end the necessary 
modifications for the other case. 

Let Z = {Zi, i = 1, . . . , N) he i.i.d. random variables, uniform over {0, . . . ,d — l}. 
We consider F* G X(d, N) defined by 

F* := X l([/<i_t) + Z l([/>i_t), 

where U is a. uniform random variable over [0, 1] independent of X and Z. X'^(Y^) = 
X'^{X) and X'^(Y^) = X^(Z) = 0. Hence, by the continuity of the intricacy, we get 
that there is some < to < 1 such that H(y*") = xNlogd. Let us check that to is 
small. 
By dM]) 

< H(F*) - (1 -t)H(X) -tH(Z) = H(r*) - (1 -t)/iiVlogc/-tA^logrf < log 2. 
so that, for some a G [0, 1], 

X — h a log 2 ^ S e 

^ ^° " 1 - h ~ N{l-h)logd - l-x-6 ^ 2' 
since 6 < do- Thus, by KTh . setting F := F*o, 

\T{Y) - (1 -to)r(X) -tor(Z)| = \T{Y) - (1 -to)X^(X)| < 2 log 2, 
and therefore by (14.21) 

\T{Y)-T{X)\ < toX"(X) + 21og2 < | iVlogrf + 21og2. 

Dividing by log d > Nq log d we obtain the desired estimate. 

For the case h > x, we use instead a system Z with constant variables, so that 
H(Z) = = X'^(Z) and a similar argument gives the result. □ 

4.3. Proof of Theorem 11.31 Assertion (1) is already established: see Proposition 
14.11 It remains to complete the proof of the second assertion. 
Let us set for 5 > 

H(X) 



X'=(d, N, X, 6) := sup <{ T{X) : X G X{d, N), 
We want to prove that 



Nlogd 



— X 



< 6 



T{d,x)= lim ^I^{d,N,x,SN). 

for any sequence ^at > converging to as ^ +oo. We first observe that (14. 5p 
gives that the limit exists and is equal to X'^{d, x) if 6j\f = 0, for all A^ > 1. Consider 
now a general sequence of non-negative numbers 6j\f converging to zero. Obviously, 
X^(d, N, X, 6n) > X^(rf, N, X, 0), so that 

limM ^(X^{d, N, X, 5n) - X^id, N, x,0)) > 0. 

N^oo A" 

Let us prove the reverse inequality for the lim sup. Let e > 0. Let X^ G X{d, N) 
realize X'^{d, N,x,6n)- Let 6o and A'o be as in Lemma 14.41 We may assume that 
N > Nq and 6n < Sq. It follows that there is some G X{d, N) with the entropy 
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iVx log d such that J^(F^) > J^(X^) -eiV. Hence, I^id, N, x, 0) > I%d, N, x, 6n) - 
eN. We obtain 

hmsup 4 {^^{d, N, x, - T{d, N, x, 0)) < e, 
Assertion (2) follows by letting e — 0. □ 

5. Exchangeable systems 



In this section we prove Theorem 1 1.5 1 namely we prove that exchangeable systems 
have small intricacy. In particular, one cannot approach the maximal intricacy with 
such systems. 

Proposition 5.1. Let X'^ he any mutual information functional and d > 2. Then 
for all 6 > there exists a constant C = C{e,d) such that for all exchangeable 

X e A:{d,N) 



T{X) < CN- 



N>2. 



(5.1) 



In particular 



lim — max I^(X) = 0. 

N^oc N XeEX(d,N) 



Proof. Fix e > 0. Throughout the proof, we denote by C constants which only 
depend on d and e and which may change value from line to line. Also k = 
{ki, . . . , kd) G N'^, X := and |k| := ki + ■ ■ ■ + k^ = n and the multinomial 
coefficients and the entropy function are denoted by: 



nl 



k,\k2\...kd\' 



^ Xi log 



1=1 



We are going to use the following version of Stirling's formula 



V2 



n 

nn I — 

e 



1 



12n 



< Cn < 



1 



12n 



n> 1. 



Therefore, for all k G N'^ such that Ikl 



n 



e"''W(27rn)i/2 J] (27™x,)-i/2 

where g(k, n) := exp(C„ — Cfei ~ ' ' ' ~ Cfc^) therefore 

exp{—d) < g(k,n) < exp(l). 
In particular, as all non-zero Xi satisfy Xi > 1/n, 



g{k,n) 



-log 

n 



Mx) 



< c 



logn 



Let X e EX{d, N). We set for < n < iV and |k| 



n 
n 



(5.2) 



Xk 



Xn = d). 
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These ("^^j^^) numbers determine the law of any subsystem Xg of size \S\ = n. It 
is convenient to define also Yi := ^{1 < j < n : Xj = i} for i = 0, . . . ,d — 1 and 



qn,^.■.= ny^ = ki, i = 0,...,d-l] 



n 



Since the vector (gn,k)|k|=n gives the law of the vector (Fi, . . . , Yd) we have in par- 
ticular 



k|=n 



Second, we observe that for l^l = n 



Yi{Xs) 1 



n n 



^ gn,k /i(x) 



|k|=n 



< c 



\ogn 



n 



(5.3) 



Indeed 

H(X5) 



n 



- V gn,k log ^ = V g„,k - log ( r ) ~ ~ ^".k logg„,k 



k|=n 



|k|=n 



lk|=n 



- V g„,kMx) + |G(n)| < 



|k|=n 

where we use fl5.2D and the fact that 



gn,klogg„,k = H(Yi, . . . , Frf) < dlog 

|k|=n 



n, 



since the support of the random vector (Yi, . . . , Yd) has cardinality at most ■nf'. 

Third, we claim that, for e > 0, there exists a constant C such that for all and 
all X e EX{d, N), for all n e [N, N] with N := [nI+' + Ij , 



qn,kh{x) - ^ qN^ich{X) 

|k|=n |K|=Af 



(5.4) 



where X := -^K (no relation with the random variable X). By (15. 3p and (15. 4p we 
obtain for all n G [A^, A^] and 15*1 = n 

R{Xs) H(X) 



(5.5) 



n N 

Let us show how (15. 5p implies (15.11) . Using B.{Xs) < logd- \S\, J2sci^s ~ S^^ 

J2 4MI(S) < ^4 X logc/-iV = logrf-iV. 

\S\<N sci 
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Using ([22D, exchangeability of X, EnLo^n (^^) = 1 and fl53|) . we estimate 

^ //V\ 

X^(X) < 2 ■ log ■ iV + 2 5^ f J H(X|i,...,„}) - H(X) 



n=N 

N 

< 



2 E " + C' iV-H.) _ H(A-) + CN. 



n=0 

Finally, using c^C^) = c]v_„(/J and EtoC^O = 1 



^ - 1 I H(X) + CA^ X A^-^+= + CN 



^w<(2i:c«(^) 

\ n=0 ^ ^ 



vn=0 

and (15.11) is proved. 

We turn now to the proof of (15.41) . We claim first that 



IXrl — AT Xr^lr ^ / 



(5.6) 



|K|=Af,K>k 

Indeed, notice that 



Pn,k = ^Pn+i,k+5J-, V < n< iV, V |k| = n, 

where 6^ := [61, ... , 5^) with 6j = 1 if z = j, otherwise. This in particular yields 
(15. 6p for N = n + 1. Moreover if |K| = n + 1 then 

n + 1\ f ^ \ 

Then, arguing by induction on > n 

d 



^ fN-n\ fN-n\ 

|K|=Af,K>k ^ ^ |K|=Ar,K>k j=l ^ 

f N-n 

'\=N+1 3=1 ^ 



|K'|=Af+l,K'>k ^ 

We recall that g„,,k = (|^) p„,,k- Notice that it is enough to prove claim (15. 4p in the 

case gAT^k' = ^k'.K, i-e., P7v,k' = (^) ^ for k' = K and zero otherwise, if we find a 
constant C which does not depend on (A^, n, K). Indeed, the two expressions are 
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a{N, K, n, := 9n,k = ( I x 



linear and the average of CA^ i/3+e ^jjj remain of the same order. Thus, we need 
to estimate: 

'n\ fN^ fN - 

K - k, 

Let X := k/n G [0, 1]^ X := K/N G [0, l]'^ and u =: n/{N - n). Formula 
implies that ^ loga(A^, K, n, k) is equal to: 

/i(x) - (1 + u-^)h{X) + u-^h{X + z/(X - x)) +G(iV, n), 

v ' 

= :</'i/,x(x) 

where |G(A^, n)| < K(logA^)/n, for some k = K{d). 

Let us now write for all (xi, . . . , Xd-i) G [0, 1]*^^^ such that Xi < 1 

H{xi, Xd-i) := h{xi, ...,Xd), Xd := I - Xi - . . . - Xd-i- 

Observe that for i, j < d — 1 

\i=j)- 



dH _ / Xd 



1 1 
h 

Xd Xi 



In particular the Hessian of H is negative-definite, since for all a G M ^\{0} 



d-l 

E 



^ /d-l \^ d-l d-l 



\.i=l / i=l i=l 

where we use the fact that x,- < 1. Hence, h is concave and we obtain 



Z/+ 1 



V 



/^(x) 



/i((l + i^)X-z/x) -/i(X) 



<0, 



_z/ + 1 z/ + 1 

so that the maximum of 0j,^x is = </),^^x(X). The second order derivative estimate 
gives: 

fx) < -2||x-X||2 



yj^.x^x; \ — ^i|x — 7V|| where ||x|| := a/xT + h x;; 

Combining with the bound \G{N ^n)\ < K{\og N)/n above, we get, for all n < N: 

a{N,K,n,k) < N"" x e-2"ll''-x|p_ 

— 2 1 

Recall n > N = N3+^ and set S := A^~3 and 



uj:= sup ||/i(X) - /i(x)|| < C51og-. 

||x-X||<5 " 



Finally, using /i(x) < logd. 



J2 QumK^) - hiX] 

|k|=n 



<uj ^ g„,k + 2 log ^ g„,k 

||x-X||<5 ||x-X||>5 



< C6\og- + Cn'^N''e-^^^" < C(log A^)A^-^ + CA^'^+V^^' < CiV-^+^ 



Then (15 ■4p and the proposition are proved. 



□ 
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6. Small support 

In this section we prove Theorem 11.61 namely we show that exact maximizers 
have small support. Numerical experiments suggest that this support has in fact 
cardinality of order d^^"^. We are able to prove the following weaker estimate. For 
a fixed law G J^{d,N), we call forbidden configurations the elements of A^^jv : = 
{0,...,(i — 1}^ with zero /^-probability. 

Proposition 6.1. LetI'^{X) he a non-null intricacy. Let d = 2 and N large enough. 
Let fi G X{d,N) be a maximizer ofl^. The forbidden configurations are a lower- 
bounded fraction of all configurations: 

#{cj G Kd,N ■■ = 0} > c{d)\Ka,Nl 

for some c{d) > independent of N. 

Proof If is non-null, then Ac({0, 1}) = 2Ac({0}) < 1 and therefore Ac({0}) < 1/2. 
However we can without loss of generality suppose that Ac({0}) = 0: indeed it is 
enough to remark that 

(1) the probability measure Aq := ^^^^ is associated with the null intricacy 

(2) the correspondence Ac ^ is linear and one-to-one, 

(3) we can write Ac = aAo + (1 — a) Ac', where 

«:=2M{0})<1. MK61) = M^^i|lJM). v.<6. 

Therefore = + (1 — a)X^' = (1 — a)I^' and I^' has the same maximizers as 
X'^ but with Ac'({0}) = 

We fix some large integer z (how large will be explained below), N > z and d > 2 
and we consider the intricacy X^ as a function defined on the simplex Ai{d,N) = 

uigA^nP'^ — 1}- a straightforward computation yields: 



dT 



2 X] 4 ^ Pa] + logP'^ - 1 



^P'^ SCI \a=ij[S] 



where a = c^fS*] iff = Ui for all i E S. The second derivatives are: 

^P'i 5(-7 Z1q=(^[S] -Pa P^ dp^gdp^^ scI^'^='^o[S]P' 



for ujq Ui. 

Let p = {Pu)iueAd iv be a maximizer of X"^. We show that for each G {0, . . . , c? — 

ilfs := {{au ■ ■ ■ , Pi,..., Pn-z) G {0, . . . , - 1}^ : « G {0, . . . , - 1}^} 

must contain at least one configuration forbidden by p. The claim will follow since 
the cardinality of {0, . . . , - 1}^"^ is d^/d\ 
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We assume by contradiction the existence of some (3 G {0, . . . , d — 1}^ ^ such 
that no configuration in is forbidden. Let ujq & VLphe such that 

p^o := min{p<^ : u E fi^g} > 0. 

Let now cji G fi^ \ {ujq}, which exists since {Qpl > > 2, so that p^-^ > > 0. We 
set for t G ] — e, 5[ and < e < p^ia 

{P^,+t, UJ = UJi, 
- t, LJ = UJo, 
Pu, UJ ^ {ljo,lji}. 

Then p* is still a probability measure for t G ] — e, £[, since Puj^ > p^^^ > e > 0. 
Since p is a maximizer, then ip{t) := 1^{p*') < f{0) '■= '^^{p) for t G ] — £, e[. Then 

0>^"(0) = ^ + ^-2- 



1 1 
— + 2^1 



Pa 

where [ljj\s = {a : a = u mod [S]} is the equivalence class of u. Therefore 

— + ^ A -2y 2y 



> 

and for some G fi^ 

On the other hand, we have: 

so that by Proposition 13.31 the left hand side of ( 16. ip is equal to: 

z 



J [0,1] \,=i J J [0,1] ^d, / 

J]0.1] ^" 



]0,1] 

Since we have reduced above to the case Ac({0}) = 0, then the latter expression 
tends to as 2 —> +oo, contradicting (16.11) . □ 

Appendix A. Entropy 

In this Appendix, we recall needed facts from basic information theory. The main 
object is the entropy functional which may be said to quantify the randomness of a 
random variable. 
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Let X be a random variable taking values in a finite space E. We define the 
entropy of X 

H(X) ■.= -J2Px{x) \og{Px{x)), Px{x) :=P(X = x), 

where we adopt the convention 

■ log(O) = ■ log(+oo) = 0. 

We recall that 

< H(X) < log|E|, (A.l) 

More precisely, H(X) is minimal iff X is a constant, it is maximal iff X is uniform 
over E. To prove (lA.ip . just notice that since ip > and (p{x) = if and only if 
X G {0, 1}, and by strict convexity of x i-^ 'fix) = xlogx and Jensen's inequality 

log \E\ - H(X) = T^^Y1 ^-^(^) 1^1 (log(^x(x)) + log \E\) 

' ' x£E \ ' ' x£E / 

with log \E\ — H(X) = if and only if Px{x) \E\ is constant in x & E. 

If we have a -E-valued random variable X and a F-valued random variable Y 
defined on the same probability space, with E and F finite, we can consider the 
vector (X, Y) a,s a. E x F- valued random variable The entropy of (X, Y) is then 

H(X,F) :=-J2P(x,Y)i^,y) log(P(x,y)(x,i/)), P(x,y)(x,l/) := P(X = x,Y = y). 

This entropy H(X, Y) is a measure of the extent to which the "randomness of the 
two variables is shared" . The following notions formalize this idea. 

A.l. Condidional Entropy. The conditional entropy oi X given Y is: 

H(X|F) := H(X,y) -H(y). 

We claim that 

< H(X|F) < H(X) < H(X,F). (A.2) 

Remark that Px{x) and Pyi^y), defined in the obvious way, are the marginals of 
^(x,y)(a;,|/), i.e. 

Px{x) = ^P(x,y)(x,y), Pyiy) = ^P(x,y)(a;,y). 

y X 

In particular, Px{x) > P(x,Y){x,y) for all x,y. Therefore 

EA«-.(.,.).os(5^).o 
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which yields 

H(X,F) = -J2Pix,Y){^,y) logP(x,y)(x,y) > - J]Px(a;) \ogPx{x) = H(X), 

x,y X 

i.e. H(X,F) > H(X) and H(X|y) > 0. Therefore 

H(X, Y) > max{H(X), H(F)}. (A.3) 

Moreover H(X,F) = H(X), i.e. H(F|X) = 0, if and only if P^x,Y){x,y) = Px{x) 
whenever P(x,y)(x, y) ^ 0, i.e. F is a function of X. On the other hand, 

H(X,r) < H(X) + H(F) (A.4) 

with equality, i.e., H(y|X) = H(y), if and only if X and Y are independent. This 
shows that H(X | Y) < H(X) and completes the proof of (1A.2I) . Formula (1A.4I) can 
be shown by considering the Kullback-Leibler divergence or relative entropy: 



Since log(-) is concave, by Jensen's inequality 

-/ < log P,x,k,(x, ,) ^"'"'fj'j' ) = log f ^ PAx) Priy)] = 0. 

V x,y ^y) / \x^y J 

By strict concavity, J = if and only if P(^x,Y){x,y) = Px{x) Pyiv) for all x,y, i.e., 
whenever X and Y are independent. 

By the above considerations, H(X | Y) G [0, H(X)] is a measure of the uncertainty 
associated with X if y is known. It is minimal iff X is a function of Y and it maximal 
iff X and Y are independent. 

A. 2. Adding information decreases uncertainty. Let us consider three random 
variables (X, Y, Z) E x F x G with E,F,G finite. Then we have that 

H(X| < H(X|F). (A.5) 

Indeed, this is equivalent to 

H(X, Y, Z) + H(F) < H(X, Y) + H(F, Z). 

Consider the quantity 

T p / XI f P{x,Y,z){,x,y,z)PY{y) 
J ■= 2^ P{x,Y,z)[x, y, z) log — 

\Pix,Y){,x,y)PiY,z){yr^) 

Since — log(-) is convex, by Jensen's inequality 

, ( P(x,Y){x,y) J2zP{Y,z)iy^^) \ 1 f^p ( N 

and the inequality follows. 
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Pxix)Pyiy) 



A. 3. Mutual Information. Finally, we recall the notion of mutual information 
between two random variables X and Y defined on the same probability space: 

MI(X, Y) := H(X) + H(r) - H(X, Y) 

= H(X) - H(X I Y) = H(F) - H(F | X) 

= ^P{x,Y)i^,y) log ( - 

x,y 

This quantity is a measure of the common randomness of X and Y. By (]A.3P and 
fCOl) we have MI(X, F) G [0, min{H(X), H(y)}]. Ml{X,Y) is minimal (zero) iff 
X, Y are independent and maximal, i.e. equal to min{H(X), H(y)}, iff one variable 
is a function of the other. 

Mutual information is non- decreasing. Let X, X', Y, Y', X, Y be random variables 
such that X,X', resp. Y,Y', are (deterministic) functions of X, resp. Y. Then: 

MI(X,r) < Ml(X,f). (A.6) 

Mutual information is almost additive: 

|MI((X,F), (X',F')) - (MI(X,X') +MI(r,r'))l <MI(X,F). (A.7) 
These properties follow from the properties of conditional entropy. First, 

MI(X, Y) = H(X) + H(F) - H(X, Y) 
= H(X) + H(X|X) + H(r) + H(F|F) - H(X, Y) - H(X|X, Y) - H(F|X, Y) 
= MI(X, Y) + (H(X|X) - H(X|X, Y)) + (H(f |F) - H(y'|X, Y)). 
flA.6|) now follows from flA.5l) . Second, 

MI((X, Y), (X', Y')) = H(X, Y) + H(X', Y') - H(X, X', y, Y') 

= H(X) + H(r) - MI(X, F) + H(X') + H(r') - MI(X', Y') 

- H(x, X') - H(F, r') + Mi((x, X'), (r, y")) 

= H(X) + H(X') - H(X, X') + H(F) + H(F') - H(F, Y') 
+ (M1((X, X'), (F, Y')) - M1(X, F) - M1(X', Y')) 
= MI(X, X') + MI(F, r') + (M1((X, X'), (y, F')) - MI(X, Y) - MI(X', F'))- 
The nonnegativity of mutual information and (1A.6P yields 

- min(Ml(X, F), M1(X', Y')) < MI((X, F), (X', Y')) - (M1(X, X') + M1(F, Y')) 

<Ml((X,X'),(F,r)). 

fOTTl) follows. 
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